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Abstract 

A stochastic ordering approach is appHed with Stein's method for approxi- 
mation by the equihbrium distribution of a birth-death process. The usual 
stochastic order and the more general s-convex orders are discussed. Attention 
is focused on Poisson and translated Poisson approximation of a sum of 
dependent Bernoulli random variables, for example fc-runs in i.i.d. Bernoulli 
trials. Other applications include approximation by polynomial birth-death 
distributions. 
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1. Introduction 

Stein's method has proved to be an effective tool in probability approximation, 
and has the advantage of being applicable in the presence of dependence. See, for 
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example, Stein (1986), and Barbour and Chen (2005) for more recent developments. 
It is well-known that error bounds obtained via Stein's method may be simplified 
under some assumptions on the dependence present. For example, in the presence of 
negative or positive relation, Stein's method gives simple error bounds in the Poisson 
approximation of a sum of indicator random variables. This is exploited throughout 
the work of Barbour et al. (1992), and will be returned to in our Section U) 

In this work, we consider the more general situation of approximation by the 
equilibrium distribution of a birth-death process, and examine the situations in which 
Stein's method leads to simple, easily calculable error bounds. These error bounds 
will typically be differences of moments of our random variables. As we will see, the 
assumptions under which we can obtain such error bounds are naturally phrased in 
terms of stochastic orderings. 

Consider a birth-death process on (some subset of) Z+ with birth rates aj and death 
rates (3j for j > 0. Suppose (3o — 0. Let tt be the stationary distribution of such a 
process, with tTj = P{tt = j), j > 0. In this work we combine Stein's method with a 
stochastic ordering construction to consider the approximation by tt of some random 
variable W on Z+. 

Our random variable tt satisfies the identity E[Ag{TT)] — for any bounded function 
g : Z+ 1-^ M, where A is the linear operator defined by 



A is a characterising operator for tt, in the sense that a random variable Z =d tt if 
and only if E[Ag{Z)] = for all g bounded. The construction of such a characterising 
operator is the basis of Stein's method for probability approximation. See the books by 
Stein (1986), Barbour et al. (1992), Barbour and Chen (2005) and references therein. 
For Stein's method applied to birth-death processes, see Brown and Xia (2001) and 
Holmes (2004). 

Given some test function h, the so-called Stein equation is defined by 



Agij) 



aj.9(i + l)-/3j.9(i), J>0. 



(1) 



h{j) -E[h{TT)]^Af ij), j>0. 



(2) 



Its solution is denoted f = fh = Sh. We call S the Stein operator. Bounds on S are 
an essential ingredient of Stein's method. 



Stein's method and stochastic orderings 



3 



Note that the solution / of the Stein equation depends on the chosen test function 
h. However, for notational convenience in much of the work that foUows we will write 
/ rather than fh or Sh. We will often choose h{i) — I^jt^B) for some B C Z+, in which 
case the solution / will depend on the chosen set B. 

There are several common distributions tt covered by this framework. For each 
of the examples below, bounds are available on the corresponding Stein operator S. 
Theorem 2.10 of Brown and Xia (2001) may also be applied to give bounds on S in 
many cases. 

• If oij — A and [3j — j, then tt ^ Po(A), the Poisson distribution with mean A. See 
Barbour et al. (1992) and references therein. 

• If Oij — q{r + j) and f3j — j, then n ^ NB(r, 1 — q) has a negative binomial 
distribution. See Brown and Phillips (1999). 

• If aj — {n — j)p and Pj — (1 —p)j, then tt ^ Bin(n,p). See Ehm (1991). 

• In the geometric case, we may, of course, use the negative binomial operator 
above. Alternatively we may choose aj = q and f3j — /(j>i), so that tt ~ 
Geom(l - q). See Pekoz (1996). 

The present work is organized as follows. In Section [2l we will derive abstract error 
bounds using Stein's method combined with some stochastic ordering assumptions in 
the setting of approximation by the equilibrium distribution of a birth-death process. 
In Section [31 a simple sufficient condition under which these stochastic ordering as- 
sumptions hold is considered, and some applications are given. Section [3] discusses 
Poisson approximation for a sum of dependent indicators. We will see how concepts of 
negative and positive relation relate to our stochastic ordering assumptions, and present 
generalizations of error bounds derived by Barbour et al. (1992). Based on this work 
we move on, in Section [51 to consider translated Poisson approximation. Applications 
here will include approximation of the number of fc-runs in i.i.d. Bernoulli trials. 
Finally, in Section [51 we give another abstract approximation theorem, and consider 
its application to a sum of independent indicator random variables. 



4 



F. Daly, C. Lefevre and S. Utev 



2. An abstract approximation theorem 



Consider Stein's method for approximating the equihbrium distribution of a birth- 
death process. Our purpose in this section is to derive abstract error bounds under 
some stochastic ordering assumptions. 

2.1. A first-order bound 

Suppose that is a random variable supported on (some subset of) with /ij — 
Piy^/ — j), j > 0. Set fj,-i — 0. Our concern is the approximation of such a variable 
W by TT, specifically by estimating the difference \Eh{W) — Eh{'K)\, i.e. |i?[^/(W^)]|. 
For this, a simple representation of this difference will be applied with some stochastic 
ordering assumptions to yield bounds using Stein's method. We may then bound, for 
example, the total variation distance between C{W) and C{7t), defined by 



Although we are mainly concerned with approximation in total variation distance, the 
results we derive may also be used with other probability metrics. 

Let A be the forward difference operator. Since, with the operator ([l]), the choice 
of /(O) is arbitrary, we follow Brown and Xia (2001) and choose /(O) = 0. Writing 
/(j) = A/(0) + • • • + A/(j — 1), we thus obtain the representation 



In the next subsection, we will extend ^ to include the Ith forward differences of /(•), 
for aU I > 1. 

We now consider how this representation may be applied in conjunction with the 
usual stochastic ordering, denoted hst- Define two random variables Wa and by 



dTv{C{W),C{Tr)) ^ sup \P{W e B) ~ P{n £ B)\. 



BCZ+ 



oo 



oo 




(3) 



P{Wa = j) 



Eaw 



and P{W0 = j) 



Ef3w 



(4) 



If Wa hst Wp and Eaw > E(3wi we have that 
i>l. In this case, ([3|) may be bounded to obtain 



oo 



> Y.'^=^(^j^^J for aU 



a 



\Eh{W) - Eh{TT)\ < \\Af\\^E[aw{W + l)-(3wW]. 
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A similar argument holds if we instead assume that Wp Wa and Efiw > Eaw- 
We thus obtain the following result. 

Proposition 1. Assume that one of the two following conditions holds: 

either (i) Wa hst Wp with Eaw > E(3w, or (ii) Wp >st Wa with E(3w > Eaw- 

(5) 

Then, 

\Eh{W) - Eh{Ti)\ < \\ASh\\^\E[awiW + l)- (3wW]\. (6) 

2.2. A s-order bound 

We will now establish our main abstract result. For that, we will have recourse to 
the concept of discrete s-convex stochastic ordering, denoted ^s-cx, for any integer 
s > 1. See, for example, Lefevre and Utev (1996) for this notion. Briefly, given any 
two non-negative integer-valued random variables X and Y , one says that X ^s-cx Y 
when 

E[f{X)] < E[f{Y)] for aU s-convex functions /, 

that is, for all functions / satisfying A'^ f{j) > 0, j > 0. Note that this ordering implies 
that X and Y have the same first s — 1 moments. 

To begin with, we introduce a Bernoulli random variable Vp with 

P{vp^l)=p=l-P{vp = 0), 

independently of all other entries. We write a — Eaw, P — E[3w , and in an analogous 
way to (HI, we define the random variables Wa and Wp by 

P{Wa^B)=a-^E[awI(w+ieB)], and P{Wp ^ B) = E[(3whweB)l (7) 

for any Borel set B. For notational convenience, we choose to write — (^). 

The key theorem and an immediate corollary will be first stated, the proof of the 
theorem being given after. 

Proposition 2. Assume that there exists a random variable Y on Z+ such that Wp — 
Y > a.s. and 

Wa hs — CX 

{Wp~Y). (8) 
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Then, 

s-l 

\Eh{W) - Eh{n)\ < \A'Sh{Q)\ \E{awCly+i) - E{pwCly)\ 
t=o 

+ \\A'Sh\\^ {aE[C^J~-2apE[C^^_Y] + {ap+\ap^l3\)E[C^^]). (9) 

Consider the special case of (|5]) when p = 1 and Y ~ Q a.s. When a — (3 and under 
the condition (fTU]) below, one has that 

E[aw{W + 1)*] = E[f3wW'], i = 0, . . . , s - 1, 

so that the inequality ([9]) reduces to pT|) . 

Corollary 1. Assume that a = (3, and one of the two following conditions holds: 

either (i) Wa hs-cx Wfs, or (ii) Wp hs-cx Wa- (10) 

Then, 

\Eh{W) - Eh{TT)\ < \\A'Sh\\^\E[awC^+i]- E[(3wC^]\. (11) 

We note that Proposition [1] does not follow as a special case of Corollary [1] since 
this latter result requires the condition a = /3 not needed in Proposition [TJ 

Proof of Proposition\^ In the first step we derive a representation of _E[yl/(VF)] 
that generalizes the representation ([3]). Observe that ([T]) and ([7]) give 

E[Af{W)] = E[awf{W + 1)] - E[l3wf{W)] - aS[/(T^„)] - PE[f{Wp)]. 

Expanding the function / by the discrete Taylor formula, we obtain, for any s — 
1,2,..., 

oo s — 1 oo 

fix) = /(o) + J2 A,/(fc) iix>k) = E + E ^'/(^) 

fe=0 t=0 fc=0 

see Lefevre and Utev (1996). Thus, we find that 

s—l oo 

E[Af{W)] = E^*/(0)i?[AC*y] + E^'/W^[^^w-fe-i] 

s-l 

+ E ^V(A;) (ai?[C^„-,-i] - /3i?[C^;_,_J). (12) 

fc=0 
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Our next step is to derive an abstract metrics-ordering relationship result, which is 
stated below as a separate lemma. Using the bound in the representation 
then leads to the announced bound pT|) . 

Lemma 1. Let X , Y and Z be random variables on Z+ such that 

Z-Y>0 a.s., and X >s~cxVp{Z ~Y). (13) 

Then, for all a,b ^ 

oo 

Y,\aE[C-^^-_\_,]-bE[C^f_l_,]\ < aE[Cj,]-2apE[CI,_y] + {ap+\ap-b\)E[CI,]. (14) 

k=0 

Proof. Letting 
we get that 

oc oo 

Y.\aE{C-^^ZV,)~bE{Cf,z\.,)\ = ^|a£;K.(X)]-fei?K(Z)]| 

oo oo 

< aY,\E[wkiX)] - E[wk{vp{Z - Y))]\ + aJ2\E[wk{vpZ)] ~ E[wk{vp{Z - Y))]\ 

k=0 k=Q 
oo 

+ Y,\'^E[wk{vpZ)]-bE[wk{Z)]\ = S1+S2 + S3. (15) 

fc=0 

Let us examine the three sums in (jlSp . First, we easily check that 



J2e[w,{Z)]^E[C^z]- (16) 

Using we successively find that 

00 

S3 = \ap-b\J2E[wk{Z)] = \ap-b\E[CI,]; 

k=0 

since Z - Y > and Z hst Z - Y, 

00 

52 = apY,{E[wk{Z)]-E[wk{Z-Y)]) = ap{E[CI,] - E[C'z-y]); 

k=0 

finally, by the assumption (|13p and a standard property of the order '^s-cx, 

oc 

Si = aY,[Ewk{X)-pEwk{Z -Y)] = a{E[C^] - pE[C'z-y])- 

k=0 

Inserting these three terms in p5|). we then deduce the bound (|14p . 
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Remark 1. For s = p = 1 and a ~ b ^ 1, Lemma [T] states that if X >st Z — Y > 0, 
then an upper bound for the Wasserstein distance between C{X) and £{Z) is 

oo 

dw{C{X),£{Z)) = ^|P(X>fc)-P(Z>fc)| < 2EY + EX -EZ. (17) 

k=0 

This bound is of interest in the stochastic ordering context investigated by Kamae et 
al. (1977), with random variables on Z+ here. Note that by choosing the optimal 
coupling X, Z and Y = {Z — ^)+, (fTT)) gives the exact bound since 

civi/(£(X),£(Z)) < 2E{Z-X)+ + EX-EZ = E\Z - X\ = dH'(>C(X), £(Z)). 

It is worth indicating that an analogous argument allows us to show that the same 
bound PT|) holds under the single condition X + Y Z. A priori, this result seems 
to be preferable, since the extra condition Z — Y > is not required. One can see, 
however, that X '^st Z — Y does not imply X + Y '^st Z in general. As an example, 
choose X — U, Y = U and Z = n a.s., where n is any fixed positive integer and U 
is discrete uniform on the set {0, 1, . . . , n}. Then, X = U =d n — U = Z — Y so that 
X '^st Z ~ Y , hut X + Y = 2U is not ^^t than n = Z. 

3. A simple sufficient condition and examples 

In practice, it may be difficult to check directly such conditions as stochastic ordering 
between Wa and Wp, as required by (O and (fTO|) . It is thus useful to have available a 
simple sufficient condition which we may then apply. 

Throughout this subsection, we assume that a — (3 and Wa and have equal 
moments of order t — 1, . . . ^ s — 1. That is, we assume 

condition (^) : E[aw(W + lf]^E[(3wW% i = 0, ...,s-l. 

A well-known Karlin-Novikoff sufficient condition to guarantee the s-convex ordering 
in (|10p under {As) is that our sequence {aj-i^j-i — (3jfij} has at most s changes of 
sign. 

Proposition 3. Suppose that the condition (Ag) is satisfied and that the sequence 
{aj-ifj,j-i — PjfJ'j} has at most s changes of sign. Then ill]} holds. 
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As a consequence, we obtain the following corollary, which extends Proposition A.l of 
Barbour and Puglicsc (2000) to birth-death processes. 

Corollary 2. Suppose that Eaw = Ej3w- If the sequence {aj-i/ij-i — f3jfij} is 
monotone, then Wa and Wp are stochastically ordered, so that the inequality ^ may 
be applied. 

We illustrate these results with the following examples. 

Example 1. Our first example is motivated by Phillips and Weinberg (2000). Let W 
have a Bose-Einstein occupancy distribution. That is, given m,d> 1, 



We wish to approximate W hy tt ^ Geom(p) where p = {d — 1)/ {d + m — 1). Let 
q = I — p. To obtain our geometric law, we choose aj = q and f3j — /(j>o)i j ^ as 
birth and death rates. 

Firstly, one can easily check that in this case, Eaw = E/^w Sind the sequence 
{qiJ.j-1 — Mj"} is non-decreasing, so that Wa hst Wp. Using Corollary[21 the bound ^ 
then becomes 



Moreover, it is known (see Pekoz (1996, Section 2)) that the Stein operator S admits 
here the representation 



From this, we find that ASh{k) = — J^Hk Ah{i)q^ , which leads to the bound 




\Eh{W) ~ Eh{n)\ < p||A5/i||oo |£^W^-£^vr|. 



(18) 



OC 



\\ASh\\^ < p-^\\Ah\ 



oo ■ 



Inserting this bound in (jlSp yields the following. 



Corollary 3. With W and tt as above, 



\Eh{W) - Eh{TT)\ < \\Ah\ 



m 



oo 



d{d-l)' 



In particular, dTv{C{W), C{7r)) < m/d{d— 1). 
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Example 2. Our next examples centre around approximation by so-called polyno- 
mial birth-death distributions, defined by Brown and Xia (2001) as the equilibrium 
distribution of a birth-death process with birth and death rates aj and Pj which are 
polynomial in j. With such choices, we will write tt ~ PBD(aj, Pj). 

Suppose that W satisfies /ij = {a + bj^^)fij^i for some a,b E M.. That is, W belongs 
to the Katz (or Panjer) family of distributions (see Johnson et al. (1992, Section 2.3.1)). 
It is well known that in this case W must have either either a binomial, Poisson or 
negative binomial distribution. 

We fix some I > 1 and consider the approximation of W by the polynomial birth- 
death distribution tt ~ PBD(a, j(3;_i(j)). Here we have chosen a constant birth rate a 
and a death rate Pj = jQi-i{j), where Qi-i{j) is a non-decreasing, monic polynomial 
in i of degree I — 1. This gives us I parameters needed to specify the distribution of tt. 
We choose these parameters in such a way that the condition {Ai) is satisfied. 

With our choice of birth and death rates we have that 

Noting that a — ajQi-i{j) — bQi-i{j) is a polynomial of degree / in j, and therefore 
has at most I real roots, we have that the sequence {aj^ifij^i — PjfJ-j} has at most / 
changes of sign, so that either Wa hi~cx W/3 or Wp hi-cx Wa- 
Theorem 2.10 of Brown and Xia (2001) gives us that 

sup{|lA5/i|U : Hj) = Ii,eB),BC Z+} < a-\ 
Hence, with h{j) — I(jeB) for some B C Z+, 

||A'5/i||oo < 2'-i||A5/i||oo < 2'-^a-^. 
From Corollary [1] we thus obtain Corollary 01 
Corollary 4. With W and tt as above, 



;-i„ -1 



{C{W)X{-^)) < '2 a 



E 



»--)-«'*-.(Ho(T 



(19) 



For example, consider the case where W ~ Bin(n,p) and tt ~ PBD(a, — 1)), 

so that 1^2. Choosing our constants a and 7 according to the prescription above. 
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straightforward calculations give us that 

a = n(n — — p), and 7 = (n — 1)(1 — 2p). 

Furthermore, 

E[W{W + l)]^Tip{np + 2-p), E[W^{W-l)]^n{n-l)p^{np + 2~2p), and 
E[W'^{W- 1)2] = n{n - l)p^ {71"^ + Anp - hnp^ - 8p + 6p'^ + 2). 

Evaluating the bound (|19p then gives 

Corollary 5. Assume that W ^ Bin{n,p) and vr ^ PBD{a,'-fj + j{j — 1)). Then, 

dTy(/:(W^),/:W) <2/. (20) 

We note that ([TO]) does not necessarily give a bound of the optimal order. In the 
case covered by (PO)) . Theorem 3.1 of Brown and Xia (2001) gives a bound on total 
variation distance of order 0{p'^ / \/X), where A = i?[T4^] = np. This disparity is due 
to our rather crude use of the supremum norm in obtaining bounds such as ()19p . 
In Sections [5] and [6l we will consider more refined ways to bound the terms of our 
Stein equation in some particular cases when we have two parameters to choose in our 
approximating distribution tt. Despite this disadvantage, we nevertheless note that 
()19[) gives an explicit bound which may be applied in many contexts. 



Example 3. Our final example of this section focuses on mixture distributions of the 
polynomial birth-death type. Suppose that tt ^ PBD(q;, (3j) and W ~ PBD(<^, j3j), for 
some constant birth rate a, polynomial death rate Pj and random variable ^ on M+. 
In this case we have that 

E funf^lPl 

J > 0. (21) 



We choose a such that a = Ef3w, that is. 



a = EY,^3,^i, = Ej2i^^j-i - EC 

J=0 j=0 



Using (PT|) . we obtain 



= E 



E 



Mo(0"- 



j+i 



«Mo($) 
Aio(a) 
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From this, we can see that the sequence {a^j — /Jj+i/Xj+i} is monotone. Hence, 
Corollary [2] gives us the following. 

Corollary 6. With W and tt as above, 

\Eh{W) - Eh{n)\ < \\ASh\\^\E[a{W + 1)- (}wW]\. (22) 

For example, if j3j — j then W ^ Po(^) and we take tt ^ Po(A), where A — E^. 
Using the well-known bound on the Stein operator S in this case, namely 

\\ASh\\oo < A-i(l-e-^) (23) 

evaluating (|^ gives, after some straightforward calculation, 

dTviC{W),Po{X)) < A-i(l-e-^)Var(0, 

a bound that has also been obtained by Barbour et al. (1992, Theorem l.C). 

4. Poisson approximation for a sum of indicators 

Throughout this section, the random variable W of interest is a sum of indicators: 

pr = Xi + --- + x„, 

where the Xi are Bernoulli variables, possibly dependent, with 

p,^P{X,^\) = l- P{X,^{)), l<i<n. 

Using Propositions [T] and [21 we are going to investigate the approximation of the sum 
by a Poisson random variable tt ^ Po(A). 

Recall that our Poisson variable is derived from ([1]) when aj = A and (3j = j, so 
that by 0, 

W^ = W+1, and PiWf, eB) = ^^^iZ^""^^ . (24) 

hi W 

for any Borel set B. In the analysis, an important role will be played by the variables 

W^ = W-Xi, l<i<n. 
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4.1. Total dependence 

Firstly, we consider the case where the indicators Xi are totaUy negatively dependent 
in the sense of Papadatos and Papathanasiou (2002). Let us recall that n random 
variables Xi, 1 < i < n, are totally negatively dependent (TND) if 

Cov[5i(X,),52(W,)] <0, l<i<n, (25) 

for all non-decreasing functions gi, g2 such that the covariance exists. 

Papadatos and Papathanasiou (2002, Theorem 3.1) show that the class of TND 
indicators includes the standard class of negatively related indicators. Stein's method 
for Poisson approximation of a sum of negatively related indicators is discussed by, for 
example, Barbour et al. (1992) and Erhardsson (2005). Recall that indicator random 
variables Xi,. . . , X„ are said to be negatively related if 

E[g(Xi, . . . , Xi_i, Xi+i, . . . , Xn)\Xi = 1] < E[g{Xi, . . . , Xi_i, X^+i, . . . , X„)], 

l<i<n, (26) 

for all non-decreasing functions g : {0, 1}"~^ i— > {0, 1}. 

We wish to bound the total variation distance between C{W) and Po(A). For that, 
we will apply Proposition [TJ By p4)) . we have that, for any function g : Z+ i-^ M, 

E[Wg{W)] 



EgiW^) = Eg{W + 1), and Eg{Wp) 



EW 



Thus, to show that Wa ts* Wjs, we must prove that if g is non-decreasing, then 
EWEg{W + 1) > E[Wg{W)]. In fact, this was estabhshed by Papadatos and Pap- 
athanasiou (2002, Lemma 3.1). 

Using the bound on the Stein operator in the Poisson case, ^ and © provide 
the following result. 

Theorem 1. // the indicators {Xi : 1 < i < n} are TND, then Wa hst Wp. If, in 

addition, EW > X, then 

dTv{C{W),Po{X)) < ([A + 1]EW - E[W'^]). 

A 

Further results on, and examples of, TND indicator random variables can be found in 
Papadatos and Papathanasiou (2002). 
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Let us now consider the case where the indicators Xi are positively dependent in a 
certain sense. We adapt the definition (|25p and say that n random variables Xi . . . , X„, 
are totally positively dependent (TPD) if 

Cov[gi(X,),52(W0] >0, l<i<n, 

for all non-decreasing functions gi, g2 such that the covariance exists. 

Association or positive relation is sufficient for TPD. This can be established anal- 
ogously to the proof of Theorem 3.1 of Papadatos and Papathanasiou (2002). Recall 
that our indicator random variables are said to be positively related if (|26p holds with 
the inequality reversed for all non-decreasing functions g : {0, i-^ {Oil}- This 
standard property is used with Stein's method by, for example, Barbour et al. (1992) 
and Erhardsson (2005). 

In the sequel, it is assumed that EW — A. To get a bound for the total variation 
distance, we will apply Proposition [2l using the lemma stated below. To begin with, 
we introduce a random variable Xy, a mixing of our n indicators, in which the index 
y is a random variable of law 

EX 

P{V = i) = — l<i<n. (27) 
A 

Lemma 2. If EW = A and the indicators {Xi : 1 < i < n} are TPD, then 

Wp hst Wo, - Xv, (28) 

where Wa — Xy > a.s. 

Proof. As seen in Wa = W+1 and thus, Wa ~ Xy > a.s. Moreover, Wp has 
the so-called Vl^-size-biased distribution: see, for example, Goldstein and Rinott (1996). 
W being a sum of indicators, it is then known that Wp admits the representation 

Wp = Y, X, + 1, (29) 
where is a random variable of law (j27p . and iiV = v, 



X, =d (X,|X, = 1), i^v. 
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Thus, by (HH), the ordering ^ is equivalent to J2i^v ^» ^s* ^ ^ ^v- estabhsh 
this, it is enough to prove that 

Y^^^^^tW ~ X^, l<v<n; 

see Shaked and Shanthikumar (2007). Now, by ([29]) and the TPD assumption, we get, 
for any real a > 0, 

X^>a) = X, > a\X, = 1) 

> P{J2 X^ >a) = P{W - X„ > a), 

which is the desired result. 

Thanks to Lemma [2l we may apply Proposition [2] with s = p = 1 . Noting that by 

EXy = 5]p,p(y = z) = -^p?, 

i=l i=l 

we then get the following result. 

Theorem 2. If EW ~ A aKC? t/ie indicators {Xi : 1 < z < n} are TPD, then 

dTv{C{W),Po{X)) < Ie[W^] + 2J2p^,-X{X + 1)\. 

This bound is obtained (and applied) by Barbour et al. (1992, Corollary 2.C.4) 
under the condition of positive relation. See also Erhardsson (2005). 

4.2. Local dependence 

Our goal in this part is to combine the previous s-convex ordering approach with a 
more flexible property of dependence. More precisely, we first introduce a concept of 
local dependence between a set of n indicators Xi, . . . , Xn. 

Let Ts be the class of all functions g : {0,1}"'^ i— > R+ that are non-decreasing 
and s-convex with 17(0) = 0. We say that the n indicators Xi . . . , are (s, 6)-locaUy 
negatively dependent ((s, (5)-LND) if there exist n non-negative reals Si . . . ,Sn (of sum 
> 0) such that 

E[X,g{W^)] < S, E[g{W^)] for aU functions g e T,, 1 < i < n. (30) 
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Similarly, Xi . . . , Xn, are said to be (s, (S)-locally positively dependent ((s, S)-LPD) if 
E[X,g{Wi)] > S,E[g{Wi)] for all functions g G J^^, 1 < i < n. (31) 
Let S := Si + . . . + Sn, and denote 

V, = l<i<n, and p = EW/S A ^^^W^- 

We then adopt the notation Vp and Xy of Sections [1] and 14.11 
Lemma 3. If the indicators {Xi : 1 < i < n} are (s,S)-LND, then 

Wo, hs-cx VpWp, (32) 
while if the indicators {Xi : 1 <i <n\ are (s, 6)-LPD, then 

— cx 

(Wa - Xv). (33) 

Proof. The method of proof is built on ideas in Barbour at al. (1992), Goldstein 
and Rinott (1996), Papadatos and Papathanasiou (2002) and Reinert (2005). Let g be 
any function belonging to JF,. As a preliminary, we observe that W <Wi + \ <W + \ 
a.s. for each i = 1, . . . , n. 

Now, consider the case of (s, (5)-LND. Using pO|) and the assumption that g is non- 
decreasing, we obtain that 

n n n 

E[Wg{W)] = ^i?[X,g(M^)] = ^ i?[X,g(Ty, + 1)] < ^ <5,i?[5(T4^. + 1)] 

i—1 i—1 i—1 

71 

< J2S,E[g{W + l)] = SE[g{Wa)]. 

i=l 

As 5(0) = 0, and EW/S > p G (0, 1], we find from §^ that 

> pE[giWp)] = E[g{vpWp)], 

hence the ordering (|32p . 

The case of (s, (5)-LPD is treated similarly. By (|3T|) and since g is non-decreasing, 
we get 

n n 

E[Wg{W)] = Y,E[X,g{W, + l)] > Y.S,E[g{W, + 1)] 

n 

= Sj2PiV = i)E[g{W+l-X,,)] = SE[giWo,~Xv)]. (34) 
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As before, we then deduce from (IMl) that 



EW - " EW 

> pE[giW^-Xv)] = E[g{vpiWo,-Xv))], 



proving the ordering 



Combining Proposition [5] and Lemma [3] would then allow us to derive an upper 
bound for the total variation distance. 

4.3. Approximate local dependence 

Approximate local dependence is becoming a rather popular topic in probability. 
For works related to this idea, see for example Chen (1975), Barbour et al. (1992) and 
Chatterjee et al. (2005). We wish now to to derive an abstract Poisson approximation 
theorem by combining stochastic ordering with such an approach. 

We say that the n indicators Xi , . . . , X^ are approximately locally negatively de- 
pendent (ALND) if there exist n non- negative reals (Si, . . . , (5„ (of sum 5 > 0), and n 
random variables Yi , . . . , y,j on Z+ such that 

E[X,g{W, - Yi)] < 6, E[g{W, -Y,)], 1 < t < n, (35) 

for all non-negative, non-decreasing functions g. Similarly, Xi, . . . ,X„ are said to be 
approximately locally positively dependent (ALPD) if 

E[X,g{W, - Y,)] > 5, E[g{W^ - F,)], 1 < i < n, (36) 

for all non-negative, non-decreasing functions g. 
Define 

n n 

e = J2 ^[^*^*]' and e, = e + ^ S,E[X, + F,], 
1=1 1=1 

and let 



CA = (A + 1)(1 - e"^)/A + 2dx, with c^a = 1 A ^/2/eX. 

Theorem 3. If EW = A and the indicators {Xi : 1 < i < n} are ALND, then 

dTvmW),Po{X)) < (I Var{W) - A| + 2e) + \6 - A|, (37) 

while if the indicators {Xi : 1 < i < n} are ALPD, then 

dTviC{W),Po{X)) < (I Var{W) - A| + 2e») + cx \S - A|. 

A 
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Before proving Theorem O we give an example of its application. 

Example 4. We examine a variation of the classical birthday problem; see also Bar- 
bom' et al. (1992). Suppose we independently colour N > 2 points with one of 
m colours, each colour being chosen equiprobably. Let T be the set of all subsets 
i C {1, . . . , N} of size 2. For i e F, let Zi be the indicator that the points indexed by i 
have the same colour. Moreover, suppose we choose uniformly r of the |F| = (^) pairs 
of points, independently of the colourings chosen. For i G F, we let = if the pair 
of points indexed by i is chosen, and otherwise set = 1. 

Set W — X]ier-^i'?«- This counts the number of pairs of points with the same 
colour, excluding those r pairs of points we have chosen. In the case where r — 0, this 
corresponds to the classical birthday problem. A bound in the Poisson approximation 
of W in this case is given by Arratia et al. (1989, Example 2). 

We observe that for aU i,j eT, E [Zi] = m^^ and E [ZiZj] 



= and ^[^^^^]-%^(%^)' 



Straightforward calculations then give 

X = E\W] = ^ and A - Var(PF) = , 

rn m'^ 

Now, we write Wi = W — Zi^i and choose 

= ^ Zj£,jl(,nj^i;i) , and 6, = E [Z,^i] . 

The condition ((35)l holds true with these choices. Indeed, Wi — Yi is independent of 
Zi and the are negatively related by construction. Thus, for all non-decreasing 
functions g, we have 

E[Z,^,g{W,-Y,)] = E[Z,^,]E[g{W,-Y,)\^,^l] < E [Z,^,] E [g{W, - Y,)] , 
as required. We further see that 

^ 2(A^-l){(^)-r}{(^)-r-l} 
m2{(^)-l} 

Evaluating ([37]) then gives the following bound. 
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Corollary 7. With W as above, 



dTv{C{WlPo{\)) < 1^ |l + 4(iV - 1) (^1^^^ j I . 

In the case r = 0, a bound of the same order was established by Arratia et al. (1989, 
Example 2). 

4.4. Proof of Theorem [3] 

(i) Consider the ALND case. We suppose first that / is any non-negative, non- 
decreasing function. Arguing as for Lemma [31 we have 



E[wf{w)] = ^i?[x,/(i^)] = ^ii;[x,/(i^, + i)] 

n n 

= E[XJ{W, -Y, + l)]+Y, E{X,[f{W, + 1)- f{W, -Y, + 1)]}, 

i=l i=l 

which we denote by Ti +T2. We bound the sum T2 by noting that 

< IIA/IU |x-y|, 

which yields 



T2 < ||A/|loo5I^(^^^0 = l|A/||oo£. 
i=l 

For the sum Ti, by (|35p and since / is non-decreasing, we get 

n n 

Ti < Y,^^E[f{W,-Y, + l)] < Y,^^E[f{W+l)] = SE[f{W+l)]. 

i=l 1=1 

Inserting these two bounds, we find that 

E[AfiW)] = XE[f{W+l)]-E[WfiW)] 

> -{6^\)E[f{W + l)]-\\Af\\^e. (38) 

To get an upper bound, we define a function / on {0, 1, . . . , n — 1} by 

/>) = ||/||oo + ||A/||ooX-/(a;). (39) 

Note that / is, as /, a non-negative, non-decreasing function. By assumption, EW = A 
so that E[Al] = 0; observe also that E[AW] = XE[W + 1] - E[W^] = -[Var(VK) - A]. 
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Thus, 

E[Af{W)] - \\g\\ooE[Al] + \\Af\\^E[AW]~E[Af{W)] 

= -\\Af\\^[VaY{W)-X]-E[Af{W)]. 

On the other hand, is apphcable to the function /, so that 

E[Af{W)] >-{d- X)E[f{W + 1)] - IIA/IU s. 

From these two formulas, we deduce that 

E[Af{W)] < IIA/IU e+{S~ X)E[f{W + 1)] + ||A/|U |Var(W^) - A|. (40) 

Now, let / be an arbitrary function. We start with the standard decomposition 
/ = /+ — /_, where /+ and /_ are non-negative, non-decreasing functions with, of 
course, 

||AV+l|oo < IIAVIloo, and IIAV-lloo < IIAVIloo, J =0,1. (41) 
By (1551) and ([^0]) . we obtain an upper bound 

E[Af{W)] = E[Af+{W)]~E[Af^{W)] 

< \\Af+\\oo e + {6- \)E[f+{W + 1)] + II A/+II00 iVar(M^) - A| 

+ {5-X)E[f^{W + l)] + \\Af^\\^e 
= IIA/+II00 iVar(W^) - A| + (||A/;||oc. + ||A/_|U) e 

+ iS - A) {II/+II00 + II A/+II00 (A + 1) - E[f{W + 1)]}, 

using and EW — X for the last equality. By a similar method, we find as a lower 
bound 

E[Af{W)] > -{S-X)E[f+{W + l)]-\\A,f+\\^s 

-||A/1|U e-{S- X)E[f^{W + 1)] - ||A/_|U |Var(W^) - A| 
= -||A/_||oo |Var(M/) - A| - (||A/+||oo + HA/llU) e 

-('^-A) {||/_||oo + ||A/_|U (A + l) + ii;[/(W^ + l)]}. 

By (|4T|) and since ||A/||oo < ||A/||oo, combining the two previous bounds then yields 

\E[Af{W)]\ < IIA/IU (iVar(W^) - A^ 2e) + |<5 - A| [2||/|U + ||A/|U (A + 1)]. (42) 
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With / = Sh, it now suffices to apply in the standard bounds 

llA5/i|loo < A-i(l~e-^)|l/i|U, and \\Sh\\^ < dx\\h\\^, 
which gives (|57|) . 

(ii) The ALPD case is deah with analogously. For / non-negative, non-decreasing, 
we first write that 



E[wf{w)] = E[x^f{w^ - -t- 1)] + ^ E[x,{f{w, + 1)- f{w, - + 1)}] 

i=l 1=1 
n 

> J2E[XJ{W,~Y, + l)]-\\Af\\^e. 
j=i 

By ((36)) . we then get that 

n 

E[Wf{W)] > ^,5,£;[/(T^,-y, + l)]-|lA/|Ue 

n 

= 6E[f{W S^E[f{W + 1) - fiW, -Y, + 1)] - II A/ll^ e 

> ||A/||oo Y.5^E{X, + Yi)~\\Af\\^e 

i=l 

= <5£;[/(14^+l)]-||A/||oo£*. 
Overall, we find that 

E[AJ{W)] = \E[J{W + 1)] - E[Wf{W)] > -{S ~ X)E[f{W + 1)] + || A/|U £*■ 
The rest of the proof follows as in the ALND case. 

5. Translated Poisson approximation 

We assume, as in Section^ that W — Xi + - ■ ■ + Xn is a sum of (possibly dependent) 
indicator random variables, with pi = P{Xi = 1). Denote 

n 

Afc = ^p*=, X = X^^E[W], and ^Va.r{W). 
We are going to discuss the approximation of by a translated Poisson distribution. 
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5.1. Main results 

A random variable Z has a translated Poisson distribution TP(A, cr^) if Z is dis- 
tributed as Z' + p, where Z' ^ Po((t^ + 7) with 



(x) = X — [xj denoting the fractional part of x. 

We note that E\Z\ = A and ct^ < Var(Z) = cr^ + 7 < cr^ + 1, so that our 
approximating translated Poisson distribution has a mean equal to, and variance close 
to, that of W. We would thus expect a closer approximation than could be obtained 
by simply using the one-parameter Poisson distribution. The variances of W and Z 
cannot necessarily be made to match exactly, as we must shift our Poisson distribution 
by an integer. However, the error term arising from this mismatch does not adversely 
affect the order of the bounds we obtain, as we shall see below. 

The following results give us bounds in translated Poisson approximation for W 
under some stochastic ordering assumptions. We defer the proofs of Theorems S] and 
[5] until Section [5.3i giving first some examples of their application, in Section [5.21 

Our bounds demonstrate convergence to a translated Poisson distribution if ct ^ 00 
as n — > 00. Bounds on the total variation distance between C{W) and a translated 
Poisson random variable may still be found if this is not the case, but require a different 
analysis of the error terms. For example, in proving Theorems 2] and [5l we write 
P{W — p < 0) < a^^ . This error term may be reduced, or even omitted altogether 
depending on the problem at hand, with a more careful analysis. This could give us 
good bounds in cases where a — > a 00 < 00 as n ~* 00. 

In the sequel, we let W'' be a random variable having the VF-size-biased distribution, 
and Vq be an indicator random variable, independent of all else, with P{vq — 1) = q. 
As before, we write Wi — W — Xi, 1 < i < n, and for any random index V we let 
Wv ^W-Xv- 

Theorem 4. Suppose that Xi, . . . , Xn are negatively related, and there is q £ [0,1] 
and I G Z+ such that 



p = A — (7^ — 7, and 7 = (A 



<y^) e [0,1), 



{W + l\Xk = 0) d.t {W + 1 + Vq\Xk = 1), l<k<n. 



(43) 



Stein's method and stochastic orderings 



23 



Then, 

A2 + (/ + g)(A-A2) 



dTv{C{W).TP{\^')) < — . , 

(T^ Act 

^(l + ^^-^)i^-^^)drv{C{Wn,C{W^ + l)). (44) 



Theorem 5. Suppose that Xi, . . . , X„ are positively related, and there is q £ [0, 1] and 
I € Z"*" such that 

iW + l\Xk = 0)hstiW-l-Vq\Xk = l), l<k<n. (45) 

Then, 

A2 + (/ + g)(A-A2) 



c;Ty(£(W^),TP(A,a^))< — , 

(^ + ^^(^ + '^')('-'^).d..(/:(H^^),/:(H^^ + i)). (46) 



Consider the stochastic ordering assumptions (|43|) and (j45| . We note that the choice 
of I and q is not unique, in that choosing I = m, q ^ 1 gives the same assumption as 
choosing I = m + 1, q = Q. It is easily checked, however, that each of these choices 
gives rise to the same bounds in (j44p and (|46p. In the examples below, we will verify 
the validity of such stochastic orderings by using an appropriate coupling argument. 

5.2. Applications 

Example 5. Suppose that Xi, . . . , X„ are independent. Thus, they are also negatively 
related. Moreover, the condition ((43|) is true for g = Z = 0. Therefore, ((44|) is applicable 
and yields the following. 

Corollary 8. With W as above. 



dTv{C{W),TP{\,a^))<^ + \. 

ACT (T^ 



This bound is of the order we would expect: see also Cekanavicius and Vaitkus (2001). 

Example 6. Suppose that m balls are placed into N urns, in such a way that no urn 
contains more than one ball and all arrangements are equally likely. Let W be the 
number of balls in the first n urns. Thus, W has a hypergeometric distribution with 

mn 2 mn{N — m){N — n) 
A = — , and a = _ ,^^2 ■ 
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We set Xi to be the indicator that the ith urn contains a ball, so that W = Xi + 
■ ■ ■ + Xn- By construction, these indicators are negatively related. The condition 
holds ior q — 1 and ^ = 0. To see this, we construct {W + l\Xk — 0) by considering 
the urns and excluding the kth. Distribute the m balls in these iV — 1 urns, such 
that all arrangements are equally likely, and count the number of the first n urns that 
are occupied. Adding one to this count gives us our random variable. We then choose 
(uniformly and independently of what has gone before) one of the occupied urns. Take 
the ball from this urn and place it in urn k. This gives us (M^ + l\Xk = 1). If the 
ball chosen is from one of the first n urns, the number of occupied urns is the same as 
before. Otherwise, we have increased the number of occupied urns within the first n. 
Evaluating the bound (|44p then gives Corollary [9l 

Corollary 9. For W having our hypergeometric distribution, 



dTv{C{W),TPiX,a')) < - + ^ 



2 I 2 / N^{N-1) 2N'^(N-1) 



(J (7^ y nin(N — m)(N — n) mn(N — m)(N — n) 

Rollin (2007, Section 4.1) has considered translated Poisson approximation for the 
hypergeometric distribution, and shows that if m = 0(n) and N — 0{n), then one 
gets a bound in total variation distance of order 0(1/ y/n). This order is also reflected 
in our result. 

Example 7. Suppose ^i, . . . are i.i.d. Bernoulli random variables with 

P = = 1) = 1-^(6 =0), l<i<n. 
Fix an integer k > 2^ and define 

n 

Xi = CiCi+i • • • 6+fc-i, and W = y^ Xi, 

1=1 

in which, to avoid edge effects, all indices are treated modulo n. Thus, W counts the 
number of fc-runs in our Bernoulli trials. Observe that 

k 

\ = np'', X2^np^'', and = (1 + p - p'=[2 + (2fc - 1)(1 - p)]). 

1 -p 

Translated Poisson approximation for fc-runs was treated by Rollin (2005, Section 3.2), 
who gives a bound in total variation distance of the form K/ y^, for some constant 
K ~ K{k,p) independent of n. Barbour and Xia (1999, Section 5) also give a bound 
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of this order for 2-runs. We shall use our Theorem [5] to give an explicit bound with 
this same order. 

It is easily seen that the variables Xi, . . . , Xn are positively related. The condition 
(|45)) holds by choosing g = 1 and I = 2k — 3. To see that, consider the following 
construction. Given the Bernoulli random variables ^i, . . . , fix some m < n and set 
Cm — Cm+i = ■ ■ • = S,m+k-i = whilc the othcrs remain independent Bernoulli 
random variables with parameter p. Counting the number of fc-runs in these n 
Bernoulli trials gives us (iy|X„j ~ 1). Suppose now we resample the random variables 
Cm, • ■ ■ , Cm+fc-i, conditional on at least one of these being zero. Counting the number 
of fc-runs now gives us (M^|X„i = 0). In this resampling procedure, one can remove at 
most 2fc — 1 of the fc-runs that were originally present. Thus, our construction implies 
that {W\X„, = 0) + 2fc - 1 > {W\X„, = 1), or, equivalently, {W + l\X^ ^ 0) > 
{W — 2fc + 2\Xjyi = 1), hence the announced values of q and I. 

Following the work of Section [51 to construct W we choose an index V uniformly 
from {1, . . . , n}, and set = Cv+i = • • • = Cv+fc-i = 1, while the other Ci remain 
independent Bernoulli random variables with parameter p. Lemma 2.1 of Wang and 
Xia (2008) thus gives us that 

dTvmw'),c{w' + 1)) <i/\ 



\J{ji~'k — \)p^(\ — pY' 
Using this, Theorem O yields the following. 

Corollary 10. Let W count the number of k-runs in n independent Bernoulli trials, 
each with success probability p. Then, 

.„•(£(»'), TF,A,,^))< 4, + !!^±a^^l<l^ 

(T^ 0" 

(2fc-2)(2fc-l)n/(l-/) / ^ 2.3 A 

^2 y^{n-k-l)pk{l-p)3 ) 

Our bound (|T7)l has the same order as that of RoUin (2005, Theorem 5) and Barbour 
and Xia (1999, Theorem 5.2) (this latter result applying only to the 2-runs case). 
Numerical comparison of the bounds shows that ours generally performs well compared 
to these other bounds, often giving a better result. Table 1 gives some illustrations, 
with values for comparison taken from Rollin (2005). 
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Table 1: Numerical comparisons for 2-runs. Upper bounds on total variation distance from 
(a) our result (|47[) . (b) RoUin (2005) and (c) Barbour and Xia (1999). Missing values are due 
to restrictions on choice of parameters. 









p = 0.10 


p = 0.25 


p = 0.50 


p = 0.75 


p = 0.90 






(a) 


0.1553 


0.0675 


0.0500 


0.0814 


0.2512 


n 


= 10^ 


(b) 


0.4463 


0.2334 


0.1747 


0.5528 


> 1 






(c) 


0.0304 




0.1251 


0.6014 








(a) 


0.0155 


0.0067 


0.0050 


0.0081 


0.0251 


n 


= IQS 


(b) 


0.0445 


0.0233 


0.0175 


0.0553 


0.2554 






(c) 


0.0030 




0.0125 


0.0601 








(a) 


0.0016 


0.0007 


0.0005 


0.0008 


0.0025 


n 




(b) 


0.0045 


0.0023 


0.0017 


0.0055 


0.0255 






(c) 


0.0003 




0.0013 


0.0060 





5.3. Proof of Theorems [4] and [5] 

Our proof is based on that of Propositions [1] and [21 using the characterising op- 
erator for the Poisson distribution. We find representations of our Stein equation in 
conjunction with which our dependence and stochastic ordering assumptions may be 
applied. 

Throughout this section we let / — Sh be the solution to the Stein equation ([2]) 
with the choices aj = cr^ + 7 and Pj = j, corresponding to the Poisson distribution 
with mean + 7. We suppose the test function h has the form h{j) = I{j^B) for 
some B C Z+. We write gsii) — f{j ^ p)- We note that gs depends on the choice of 
set _B, though for notational convenience wc will often write simply g for gB- We note 
further that bounds on the suprcmum norm of / also apply to so that in particular 
||A5b||oo < for each B<ZI+. 

Following RoUin (2007, Section 3), we obtain from the Stein equation that 

dTv{C{W),TP{\,a'')) < sup \E[{a^+-f)gB{W+\)~{W-p)gB{W)]\ + P{W-p<Q). 

BCZ+ 

(48) 

One may bound P{W — p < 0) < cr^^ using Chebyshev's inequality. So, we now 
concentrate on the first term on the right-hand side of (|48p . Throughout our proof, 
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we will make use of the following equalities in distribution: 

{W\Xv = 1) =d W% and {Wv\Xv = 0) =d {W\Xv = 0). (49) 

Step (1). For this part of the proof, wc will consider separately the cases where 
cr^ < A and ct^ > A. We begin by assuming cr^ < A, so that p > 0. Recall that 

E[Wg{W)] = XElgiW)]. (50) 

Using (|50p. we can then write that 

E[{cj^ + ^)g{W + 1)~{W~- p)g{W)] = \E[g{W) - g{W')l (51) 

where 

P{W = j) = A-i { {a^ + j)P{W + l=j) + pP{W = j)}, j> 0. 

That is, W = W + Vr where Vr is a Bernoulli variable with success probability r — 
A^^(cr^ +7)- Note that r < 1 by assumption. We rewrite ([5T|) as 

XE[g{W) - giW')] = XE[g{W) - g{W)] + XE[g{W) - g{W')], (52) 

by defining W = Wv + 1, where is a random index chosen according to ((27| . For 
the first term in (|52p we note that, by conditioning on Vr, 

XEg{W) = XEg{W + Vr) = {a^ + j)EAg{W) + XEg{W). (53) 

Furthermore, by conditioning on Xy and using the equalities (|49p . 

XEg{W) = XEg{Wv + l) = X2Eg{W') + {X - X2)E[g{W)\Xv = 0], (54) 

since P{Xv = 1) = A^^A2. Again by considering conditioning on Xy and using (|49|) . 
we have that 

{X- X2)E[g{W)\Xv ^0] = XEg{W+l)-X2Eg{W' + 1). (55) 

Combining ((53)) . ((54| and (|55|) we obtain the following. 

XE[g(W) ~ g(W)] = {cr"^ + j - X)EAg{W) + X2EAg{W') 
= X2E[Ag{W) - Ag(W)]+-fEAg{W) 

+ {a^ - X + X2)EAg{W). (56) 
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Now consider the second term of (|5^ . Let us combine it with the final term of 
Since 

E[W - W] = ~X-'^{cr^ - A + A2), 
and proceeding as we did in deriving ([3]), we get that 

XE[g{W) - g{W')] + (ct^ - A + X2)EAg{W) 

00 

= XeJ2 (Aff (j) - Ag{Wj) [P(W > j) - P{W' > j)] . (57) 
Using the definition of W, conditioning on Xy and employing (P^ . we have that 

X[P(W > j) - P{W' > j)] 

= (A - X2)[P{Wv + l>j\Xv = 0)- P{Wv + l>3\Xv = 1)] . (58) 
Hence, the right-hand side of ([57)1 becomes 

OG 

(A - X2)eJ2{^9{3) - A.g(V7)) [P{Wv + 1 > j\Xv = 0) - P{Wv + 1 > j\Xv = 1)] . 

(59) 

Let us now insert the representations and ([55)1 into ([5T|l and then (pS)) . We obtain 
dTy(£(W^),TP(A,CT')) < (A- A2) sup {Ab}+X2 sup |£;[AgB(W^') - A.9s(W^)]| 

+ 7 sup \EAgB{W)\+ P{W ~ p <0), 

BCZ+ 

where 

00 

Ab = eJ2 \^9B{j) ~ AgB{W)\\P{Wv + 1 > === 0) - P{Wv + 1 > j\Xv = 1)|. 
3=0 

Recalling that P{W — p < 0) < cr~^, 7 < 1 and || AgsHoo < cr~^, we have that 
7 sup \EAgB{W)\+ PiW - p <0) < 2a-^. 

BCZ+ 

Furthermore, the random variable W having the VK-size-biased distribution satisfies 

PiW = j) = X-'jP{W = j), 0<j< n, 

and so, 

00 

2dTv{C{W),C{W')) = ^\PiW = j) - P{W' = j)\ = E\1-X-^W\ < A-V. 

j=o 

(60) 
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We thus have that 

\2\E[^9b{W') - ^gB{W)]\ < 2X2\\AgB\\oodTv{C{W),C{W')) < ^. 
' ' Act 

Combining the above bounds, we obtain 

dTvmW),TP{X,<j^)) < {X- X2) sup {Ab} + ^ + ^. (61) 

In the second step of the proof, we consider how Ab may be bounded. Before doing 
this, we show that if cr^ > A then the bound (|6ip continues to hold. 

Consider now the case where > A, so that p < 0. We wiU use an analogous 
argument to show that the bound (ICT]) continues to hold. In place of (15^ , we this time 
write 

E[ia^ + j)giW + 1)-{W- p)g{W)] = [a^ + ^)E[g{W + 1) - g{W)] 

+ {a' + ^)E[g{W) - g{W*)], (62) 

where W = W + Vt{l- Xy), W* = vtW'' + (1 - vt)W and t = X{a^ + 7)-^. Consider 
the first term on the right-hand side of ([62|) . For this term, we argue as we did to 
derive (|56p . Conditioning on vt and Xy and employing the equalities (|49p . we find, as 
for dnnD, that 

ia^+j)E[giW + l)-g(W)] 

= X2E[Ag{W'') - Ag{W)] + -fEAg{W) + (cr^ - A + X2)EAg(W). 

As we have that 

E[W - M^*] = -{a^ + -ry\a^ - A + A2), 

we then write 

(a^ + j)E[g{W) - g{W*)] + {a^-X + X2)EAg{W) 

00 

= +7)£;^ (A.g(j-) - Ag{W)) [P{W > j) - P{W* > j)]. (63) 

Using the definitions of W and W*, and conditioning on wt, we find that 
P{W > 3) - P{W* >j)=t [P(W > 3) ~ P{W' > 3)] . 
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Comparing this with (|57|) . recaUing the definition of t and using (|58p. we find that ([M)) 
also gives us a representation of ([55)1 . Continuing the argument as before, the bound 
([6T|l holds too in the present case. 

Step (2). In this part of the proof, we bound As, and thus obtain the bounds 
of our theorems. In doing so, we will use our stochastic ordering and dependence 
assumptions. The cases where Xi, . . . , Xn are positively and negatively related will 
be discussed separately. In the positive related case, the argument of Lemma [2] shows 
that 

P{Wv + 1 > 3\Xv = 0) - P{Wv + 1 > j\Xv = 1) < 0, j> 0. 
Noting that (Wv + ^Xy = 1) =d VK", we fix some I e Z+ and write 

P{Wv + 1 > j\Xv = 1) - PiWv + 1 > j\Xv = 0) 

I 

= P{Wv + 1 > j + l\Xv = 1) - PiWv + 1 > j\Xv = 0) + J2 = + (^4) 

1=1 

Suppose now that there is some q G [0, 1] such that for each j > 

P{Wv + l> j + l\Xv = 1) - P{Wv + 1 > j\Xv = 0) 

<qPiWv=j+l\Xv = l) (65) 
^qP{W' + 1 + 1). (66) 

We will show presently that this is implied by the stochastic ordering assumption (|45l) . 
Using dMl) and we find that 

As < qE\AgB{W' -l-l)~AgB{W)\+Y, E\AgB{W' - i) - AgB{W)\ 

i=l 

I 

< 2q\\AgB\\oodTv{CiW),£{W' - I - 1)) + 2|| Ag^lU ^ dTy(/:(M^), £(1^^ - i))- 

i=l 

(67) 

Using our bound on ||A(7b||oo and the triangle inequality for total variation distance, 
the first term of (|67p is bounded by 

2qa-^{dTvmW),CiW')) + {l + l)dTvmW'),C{W' + 1))} 

- {jX^^^^ l)dTvmW'),C{W' + 1))} , (68) 
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where this last inequaUty uses (|60p . Similarly, the second term of (|67p may be bounded 

by 

I 

2a-2 {dTvmW), £{W')) + I dTvmW), C{W' + 1))} 

- {t + + l)dTv{C{W'),£{W' + . (69) 

Combining ([67|) . ([68|) and (f69|) with the bound ([6T|) yields the desired inequality ((46|) . 

So, the proof of Theorem [5] is completed upon showing that the stochastic ordering 
condition (|45)) implies the inequality (|65|). Writing 

P{Wv =j + l\Xv = 1) = PiWv + 1 > j + = 1) - P(W^y > j + l\Xv = 1), 
for < j < n, it can be seen that ([55)1 is equivalent to 

P(W^y + 1 > = 0) > (1 - <7)P(VKy + 1 - / > = l) + qP{Wv - I > j\Xv = 1), 

for j > 0. This, in turn, is equivalent to the stochastic ordering 

(W + l\Xv = 0) hst (1 - Vg){W - l\Xv = 1) + VqiW - I - l\Xv = 1), (70) 

which can be seen using (|49p . Some rearranging shows that the stochastic ordering 
assumption (US)) implies the stochastic ordering ([70]) , hence the result of Theorem [5] 

We turn our attention now to the case of negative relation, and complete the proof 
of Theorem 31 When Xi, . . . , Xn are negatively related, one can use a similar argument 
to the above. We have here that 

P{Wv + 1 > j\Xv - 0) - P{Wv + 1 > i\Xv = 1) > 0, < j < n. 

Analogously to the positively related case, we write, for some fixed I e Z+, 

P{Wv + 1 > j\Xv = 0) - PiWv + 1 > j\Xv = 1) 

= P{Wv + 1 > j\Xv = 0) - P{Wv + 1 > J - l\Xv = 1) + X! = - 

1=0 

This time, we suppose that there is g G [0, 1] such that 



P{Wv + l > j\Xv ^Q)-P{Wv + l > ]\Xv = 1) < qP{Wv + l + l = ]\Xv = 1). (71) 
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Following a similar argument to that used in the case of positive relation, we find that 

A. < ^ + drvmwnxiw^ + D). 

Combining this with ([6T|l gives us the desired inequality (|44| . It remains to show that 
the stochastic ordering assumption implies the inequality (|7ip. which can be done 
as above. 



6. Another abstract approximation theorem 

Our aim hereafter is to consider an alternative approximation theorem which may 
be found within the present framework. For concreteness, we suppose that the birth 
rates aj and death rates f3j are such that the random variable tt has two parameters 
available to choose. This will be the case in the application presented afterwards. 

Let us return to the basic representation ((T^). To choose the two parameters of tt, 
it seems natural, in our context, to consider s = 2 and introduce the two conditions 
a = /3 and EW^ = EWp (i.e., E[aw{W + 1)] = E[l3wW]). With these choices, the 
representation (fT2|) becomes 



Eh{W) - Eh{^) =aJ2 ^'/(O -i-l)+-{W0-i- 1)+]. (72) 

Moreover, suppose that one can construct Wa and on the same probability space 
in such a way that Wp — Wa + Y for some random variable Y which takes values in the 
set {—1,0,1}. Under this assumption, £'[P^q] — E[Wj3] — E[Wa +Y], which imphes 
E[Y] — 0. It is easily seen that the representation ([7^ can be rewritten as 

oc 

Eh{W)-Eh{TT) ^ -a^A2/(z)£;[y/(>^^_i>,+i)+y+/(^y^_i^,)] 

1=0 

= -aE[I^Y=i)^^f{Wa-l) + YAf{Wa-l)]. (73) 

Noting that 

\E[I^Y=i)A''f{Wa-l)]\ < 2\\Af\\^dTv{C{Wa),C{Wa + l)) sup{P(r- 1|VK„)}, 



\E[YAf{Wa-l)]\ < \\Af\\^E\E[Y\Wa]\ < \\Af\\^^Veii{E[Y\Wa]), 
we can immediately bound the right-hand side of (|73p to obtain the following. 
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Proposition 4. Suppose that a = P and EWa — EWp. If Wa and Wp can be 

constructed on the same probability space such that 

Wfj — Wa + Y for some random variable Y valued in {—1, 0, 1}, (74) 

then, 

\EhiW) - Eh{TT)\ < 2a\\ASh\\^dTvmWa),CiWa + 'i^)) sup{P(r = 1|VF„)} 

w 

+ a\\ASh\\ooVVar{E[Y\Wa]). (75) 

Clearly, if such a random variable Y takes values on a bounded set other than 
{— 1, 0, 1}, a representation analogous to (I73|) may still be found, and a result analogous 
to Proposition 2] is available. We now apply our Proposition [4] to approximate a sum 
of independent indicator random variables. 

Example 8. Suppose that W ~ Xi + - ■ ■+Xn is the sum of independent Bernoulli ran- 
dom variables with success probabilities Pi, I < i < n. Brown and Xia (2001, Section 
3) showed that in this case, one can improve on Poisson or binomial approximation for 
W by using a so-called polynomial birth-death distribution, with the choices aj = a 
and Pj — jj + j{j — 1) for some constants a and 7. 

We will follow that approach and choose here a and 7 such that a = (3 and 
i?[Q;iy(W^ + 1)] = E[f3wW]. Straightforward computations then give us expressions 
for these parameters: 

J ^ X^X-^ -1-2X + 2X3X^\ and a = jX + X^ -X2, (76) 

where Xk — J^^^iPi ^^"^ A = Ai = (as in Section 5). Note that the parameter 

choices ([76|) are the same as those employed by Brown and Xia (2001), who based their 
selection on minimising the error bound obtained in their result. 

To begin with, let us prove that the condition (|74p is satisfied. Since the birth rate 
is constant (as in the Poisson case), we again have that Wa — W + I. Let us turn our 
attention to Wp. We let W^ = W - X;, and Wi^ ^ W - Xi - Xj, < i,j < n and 
observe that W{W — 1) = J2i<i^j<n ^i^j- By the definition of Wp, we get that 

PiWp = k) = a-'E{[jW + W{W~l)]I^w=k)} 

71 

= a~i[7^p,P(M^, + l = fc)+ J2 P^PJP{W^,J+2 = k)], 
i—1 l^^T^J^^ 
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for 1 < fc < n. In the spirit of the size-biasing construction of Section 01 we define now 
two random indices T,U G {1, . . . , n} chosen according to the distribution 

P{T^t,U^j) ^ , i^j, and P{T = U = i)^0. 

V - A2 

Recall also the definition (I27p of the random index V . Combining these definitions 
with the above, we may write 

PiWn = fc) = a-^-f\P{W + 1-Xv = k) + a-^{\^ - \2)P{W + 2- Xt~ Xu ^k), 

for 1 < fc < n. Let q = a~^7A; note from ([75)1 that < g < 1 whenever 7 > 0. In 
the sequel we will assume that this is indeed the case. Introduce a Bernoulli random 
variable Vq with success probability q, independent of all other entries. We may then 
write 

Wp = Vq{W + 1 - Xy) + (1 - Vq){W + 2-Xt-Xu) = W + \+Y = Wc,+Y, 
where 

Y={l-Vq){l-XT^Xu)-VqXv, (77) 

Y being valued in {—1, 0, 1} with E\Y] =0, as desired. 

Now, let us evaluate the bound fTSl) . First, we need a bound on the solution / of 
the Stein equation in this situation. By Theorem 2.10 of Brown and Xia (2001), one 
knows that 

sup{||A^/i|U : h{]) = /(jes), B C Z+} < a'K (78) 

Further, W being a sum of independent indicators, one has (from Barbour and Jensen 
(1989, Lemma 1)) 

dTv{C{W),C{W + l)) < ^ =. (79) 



Finally, consider the two conditional terms in ([75|l . Note from ([77]) that y = 1 if and 
only if Vq = Xt = Xjj = 0, so that 

P{Y^1\W) = {1 - q)P{XT ^ Xu ^ 0\W) - {l-q)E[il-XT)il-Xu)\W] 
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This probability takes its greatest value when W = with E[{1 — Xi){l — — 
0] = 1 for all I and j. Hence, 

sup{P(r=l|W^)} = a-i P^PJ^a-\\^ -\2). (80) 

Now, let \\Z\\ = {E[Z^]y^'^ be the L2 norm for any random variable Z. Since T =d U 
and = 0, we write 

E[Y\W] = -q{E[Xv\W] - E[Xv]) - 2(1 - q){E[XT\W] - E[Xt]), 

and thus 

Vvar(£;[y|PF]) = ||i;[r|w^]|| 

n 

< qJ2mXj\W]-E[X,]\\P{V^j) 



-2(1 - q)J2\\E[X,\W] - i?[X,]|lP(T = i) 



< (q + 2(l-g)) max ./Var(i;[Xj 

l<7<n V 



l<j< 

When = p for j = 1, . . . , n, E[Xj\W] — W/ n and so the bound becomes the equality 



y/YaiiE[Y\W]) = (2 - q)^/V(ir{W/n). (81) 
Inserting ([TSD, jTS]), dM]) and (gT]) in dTS]) then provides the following bound: 

dTv{C{W),C{7^)) < —^+^^—^ = Oip/VX), 
(1 — p)a n 

where = Var(VF) = np{l —p). 

By exploring the explicit structure of the auxiliary variable Y, it is possible to 
derive better bounds. Throughout this part we let a = 1 — a for any a e R and 



= \jYA=k+i Pi^ where is the ith largest number of pi{l - pi), . . . ,p„(l - p„). 
From Barbour and Jensen (1989, Lemma 1) we have that for all i, j = 1, . . . , n and 

i 7^ j, 

2dTv{CiWi),C{W, + l))<a^^ and 2dTvmW,^j), CiW^,j + I)) < 
Notice that, from representation ([77]) . 

I{Y=1) = VqXrXu , ^(l-^-l) = VqXv + VqXxXu- (82) 
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The derivations below are based on the conditional independence of Xt and Wr, given 
T and similarly Xu and Wu, given U and Xy and Wy, given V . By substituting ([5^ in 
(I73p . integrating with respect to Ug, separating linear and quadratic terms and noticing 
that T —d U, we derive, after some simple calculations, 

/ = Eh{W) ~ Eh{n) 

= -aE[v,XTXuAfiW + 1)] + aEiivgXv + VqXTXu)AfiW)] 
= -aqElXrXuA'^fiW)] 

+2aqE[XTA^f{W)] 

-a{qE[Af{W + 1)] - E[{2qXT + qXv)Af{W)]) 
= h+h+h. 

Using the conditional independence of Wt,u and Xt, Xjj given T and U , the first term 
Ii is bounded by 

= aq\EE[XTXu\T,U]E[A''f{WT^u + 

\ 2 \ 

< 2aq\\Af\\ooE[XTXu]^a^{dTv{C{W^,j),C{W^.j+l))] < -. 

i^j aa2 

By conditioning on T, 

I/2I = 2aq\EE[XT\T]E[A^f{WT + l)]\ 

< Aaq\\Af\\ooE[XT]max{dTv{C{W,),C{W, + l))} < ^ 
i aai 

To bound I3, we first notice that since E[Y] =0, 



q = 2qE[XT] + qE[X 



. . _ v\- 
Thus, 

I/3I = \2aqE{XT{E{Aj[WT + \)\T\~E\Aj[WT+XT + \)\)) 

■\'aqE{Xv{E{Aj{Wv + \)\V\ - E{Aj {Wy + Xy + 1)])}| 
< 2a{2qE\{E{XT\T)Y\ 

+g£;[{i;(Xy|F)}2]}||A/||oomax{dTy(/:(W^0,/:(W^. + 1))} 

^ 2(AA3-A4) ^ 7A3 
~ a.a\ aai 

By combining the bounds on Ii , /2 and I3 we derive the following. 
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Proposition 5. With W and it as above, 



Ai-A4 , 2(AA2-A3) , 2(AA3-A4) , 7A3 




(83) 



Let us conclude by comparing our result with that of Brown and Xia (2001, Theorem 
3.1), who obtain 



When Pi — p ^ for each i and A ^ 00, both the bounds ([83)1 and ((84)) are 
asymptotically equivalent to 3p^/^/X. 
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